https://docs.pymc.io/notebooks/api_quickstart.html#4.1-Predicting-on-hold-out-data
In many cases you want to predict on unseen / hold-out data. This is especially relevant in Probabilistic Machine Learning and Bayesian Deep Learning. While we plan to improve the API in this regard, this can currently be achieved with a theano.shared variable. These are theano tensors whose values can be changed later. Otherwise they can be passed into PyMC3 just like any other numpy array or tensor.
This distinction is significant since internally all models in PyMC3 are giant symbolic expressions. When you pass data directly into a model, you are giving Theano permission to treat this data as a constant and optimize it away as it sees fit. If you need to change this data later you might not have a way to point at it in the symbolic expression. Using theano.shared offers a way to point to a place in that symbolic expression, and change what is there.
In [1]:
import theano
import pymc3 as pm
import numpy as np
In [ ]:
x = np.random.randn(100)
y = x > 0
x_shared = theano.shared(x)
y_shared = theano.shared(y)
with pm.Model() as model:
coeff = pm.Normal('x', mu=0, sd=1)
logistic = pm.math.sigmoid(coeff * x_shared)
pm.Bernoulli('obs', p=logistic, observed=y_shared)
trace = pm.sample(50000)
In [10]:
pm.traceplot(trace, combined=True)
Out[10]:
In [4]:
x_shared.set_value([-1, 0, 1.])
y_shared.set_value([0, 0, 0]) # dummy values
with model:
post_pred = pm.sample_ppc(trace, samples=500)
In [5]:
post_pred['obs'].mean(axis=0)
Out[5]:
In [6]:
x = np.random.randn(100)
y = x > 0
with pm.Model() as model:
coeff = pm.Normal('x', mu=0, sd=1)
logistic = pm.math.sigmoid(coeff * x)
pm.Bernoulli('obs', p=logistic, observed=y)
trace = pm.sample()
In [7]:
pm.traceplot(trace)
Out[7]:
In [8]:
with model:
post_pred = pm.sample_ppc(trace, samples=500)
In [9]:
post_pred['obs'].mean(axis=0)
Out[9]:
In [ ]: